Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9610 / 000094_owner-urn-ietf _Thu Oct 24 11:18:31 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 6KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id LAA13819 for urn-ietf-out; Thu, 24 Oct 1996 11:18:31 -0400 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id LAA13812 for <urn-ietf@services.bunyip.com>; Thu, 24 Oct 1996 11:18:20 -0400 Received: from acl.lanl.gov by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA26156 (mail destined for urn-ietf@services.bunyip.com); Thu, 24 Oct 96 11:18:18 -0400 Received: from legiron.acl.lanl.gov (legiron.acl.lanl.gov [128.165.147.188]) by acl.lanl.gov (8.7.3/8.7.3) with SMTP id JAA05603; Thu, 24 Oct 1996 09:18:12 -0600 (MDT) Message-Id: <2.2.32.19961024152549.006b6fac@acl.lanl.gov> X-Sender: rdaniel@acl.lanl.gov X-Mailer: Windows Eudora Pro Version 2.2 (32) Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Thu, 24 Oct 1996 09:25:49 -0600 To: Terry Allen <tallen@fsc.fujitsu.com>, urn-ietf@bunyip.com From: Ron Daniel <rdaniel@acl.lanl.gov> Subject: Re: [URN] Unicode for NSS query Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: Ron Daniel <rdaniel@acl.lanl.gov> Errors-To: owner-urn-ietf@bunyip.com Thus spoke Terry Allen (at least at 03:52 PM 10/23/96 -0700) [on the subject of UNICODE, aka 10646, for URNs] > - why care? the NSS is supposed to be opaque Which means? I think it means we can't go around inferring structure in arbitrary namespaces. Opaque does not mean that we have to take whatever comes. There can be particular requirements on the characters allowed. I think this WG should make an explicit decision on trying to go to UNICODE rather than just defaulting to ASCII. The reasons for doing so: 1) The IAB workshop tells us to try to do so. 2) Not all the namespaces in the world use the Latin alphabet without accents. 3) Isn't it time we start getting away from the US-ASCII assumption? It may be that the group decides not to do UNICODE because of technical objections. That would be fine with me, I just want the decision to be explicit. > - does this imply that a) NSSs should be formed originally > in Unicode, or that b) NSSs in other coded character sets > must be translated/transliterated into Unicode in forming > URNs, or c) something else? What I assume happens is that namespaces are defined in terms of glyphs, not coded character sets. Browsers have the job of taking some coded representation of the glyph and translating it (if necessary) to a 10646 coded version of the glyph. Then they do UTF-8 and %encoding before sending it on the wire to a resolver. (Note that the NAPTR resolution scheme is probably going to find it impossible to get people to resolvers if that decision has to be made on non-ASCII characters in the URN. Since NAPTR is intended for Experimental rather than standards track this may be OK. I will take a look at what can be done to deal with UNICODE in regexps.) >As an example, suppose that I have an existing name space, >well ordered in every respect, in ISO 8859-6 (Latin/Arabic). Is this assumption of characterset appropriate? Is it not more likely that the namespace will say "names are formed from the set of glyphs [x-y]" (where x and y are denoting arbitrary glyphs, not the Latin letters x and y) leaving the problem of how to code those things as bits up to someone else? >I want this name space to be used for URNs, and (supposing >that the coded character set is not an obstacle at this >point) get it registered. As the upper half of 8859-6 is >not a subset of Unicode, per the present syntax draft, Do you mean that: a) The glyphs in the upper half of 8859-6 are not in UNICODE b) The glyphs are there but the bit patterns of their coded representation differ c) (B) + non-unique mappings from 8859-6 to same glyph in UNICODE >I can't use it directly in an URN (although doing so would >pose no problem I can think of, supposing that it is a rule >of the name space that its names are in 8859-6). > >When I translate that 8859-6 name into Unicode I have more >than one possible outcome (depending on whether I keep it >simple, using 0621--064A or use Unicode code points that >include diacritical marks or use Unicode code points that >indicate glyph variants of a letter, such as 06AA, "Arabic >Letter Swash Kaf," which is lexically the same as 0643, >"Arabic Letter Kaf`" or specify some ligatures). Right. We have been assuming that the user enters a URN into a browser using their local language/script/... and the browser has the job of making it into UNICODE/UTF-8/%encoded. Are there guidelines already in place on how, when a user wants an "A" plus "grave" (or whatever), that is to be encoded? Is it reasonable for us to cite such guidelines as the way namespaces should be encoded? (Also, is it SHOULD be encoded or MUST be encoded). > Say >I can have outcomes A, B, and C, all of them legitimate >representations of my 8859-6 name in Unicode. Are >urn:mynamespace:A, urn:mynamespace:B, and urn:mynamespace:C >equivalent? I think that we can't mandate that they be lexically equivalent. That requires FAR too much knowledge on the part of all URN software. It may be that a resolver can be made smart enough that, when dealing with a particular language and alphabet, it can recognize "A with grave" as equivalent to "A" and "grave". I think that is outside the bounds of what we standardize. The place to attack this problem of A, B, and C all being legitimate UNICODE representations of some sequence of glyphs is when the glyphs are first translated to UNICODE. If there are recommendations on preferred encodings, we should cite those. If not, then we should decide on which is more important - UNICODE's capabilities or unambiguous encoding. Ron Daniel Jr. email: rdaniel@lanl.gov Advanced Computing Lab voice: +1 505 665 0597 MS B287 fax: +1 505 665 4939 Los Alamos National Laboratory http://www.acl.lanl.gov/~rdaniel/ Los Alamos, NM, USA 87545 obscure_term: "hyponym"